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1. Introduction 

In this paper we are concerned with the reconstruction of an unknown gray- valued image 
G L'^{n) with n = [0, 1]2 given the data 

Yij = {Ku%j +Sij, 1 < z < m, 1 < j < n. (1) 

For the moment, we assume that Sij are independent and identically distributed Gaussian 
random variables with E{sn) = and E (sj-^) = > and that K : L'^{n) M^^^ is 
a linear and bounded operator. K is assumed to model image acquisition and sampling at 
the same time, i.e. {Ku)ij is assumed to be a sample at the pixel {i/m^j/n) of a smoothed 
version of u. Throughout the paper we will assume that is known (for reliable estimation 
techniques for cr^ see e.g. [30j and references therein). 

A popular approach for computing a stable approximation of vP from the data Y given in 
the Gaussian model ([T]) consists in minimizing the penalized least squares functional, i.e. 

u{\) G argmin ^ ^ \{Ku)ij - Y,jf + J{u) (2) 

where J : L^{n) is a convex and lower-semicontinuous regularization functional and 

A > a suitable multiplier. In the seminal work [33j, for example, the authors proposed the 
total variation semi-norm 

JW = < (3) 
I +OC else 

as a penalization functional which has been a widely used model in imaging ever since. Here, 
\Du\ {Vt) denotes the total variation of the (measure- valued) gradient of u which coincides with 
\S/u\ if u is smooth. Numerous efficient solution methods for ([6]) [71 [lOl [26] and various 
modifications have been suggested so far (cf. [HI [20l EU [36] to name but a few). In particular, 
in order to accelerate numerical algorithms and to prevent oversmoothing the total variation 
semi-norm is often augmented to 



J{u) +-f [ (4) 



with 7 > 0. 

The quadratic fidelity in ([2]) has an essential drawback: The information in the residual is 
incorporated globally, that is each pixel value {Ku)ij — Yij contributes equally to the estimator 
u{X) independent of its spatial position. In practical situations this is clearly undesirable, since 
images usually contain features of different scales and modality, i.e. constant and smooth 
portions as well as oscillating patterns both of different spatial extent. A solution u{X) of ([2]) 
is hence likely to exhibit under- and oversmoothed regions at the same time. 

Recently, spatially-adaptive reconstruction approaches became popular that are based on 
([2j) with a locally varying regularization parameter, i.e. 

u{X) G argmin - 

However, the choice of the multiplier function Xij is subtle and different approaches have been 
suggested. See for instance pTl [22l [TTl [27] . 

In this paper we take a different route to achieve spatial adaption which allows to "localize" 
any convex functional J by minimizing it over a convex set that is determined by the statistical 
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extreme value behaviour of the residual process. More precisely, we study estimators u of 
that are computed as solutions of the convex optimization problem 

inf J(u) s.t. max^ V \(Ku)ii-YiA^<l. (6) 

Here, S denotes a system of subsets of the grid G = {1, . . . , m} x {1, . . . , n} and {cs : S ^ S} 
is a set of positive weights that govern the trade-off between data-fit and regularity locally on 
each set S G S. Solutions of (|6]) are special instances of statistical multiresolution estimators 
(SURE) as studied in [I9]. In this context the statistic T : R^><^ M defined by 

r(i;)=max^ V E M^^^ (7) 

is referred to as multiresolution (MR) statistic. Summarizing, an SMRE u of vP is an element 
with minimal J among all candidate estimators u that satisfy the condition T{Ku — Y) < 1. 

Special instances of ([6j) have been studied recently: For the case when S contains the entire 
domain G only, it has been shown in [8] that ([6j) is equivalent to ([2j) if K satisfies certain 
conditions. As mentioned above, this approach is likely to oversmooth small-scaled image 
features (such as texture) and/or underregularize smooth parts of the image. An improved 
model was proposed in [2] where S is chosen to consist of a (data-dependent) partition of G 
that is obtained in a preprocessing step (for the numerical simulations in Mumford-Shah 
segmentation is considered). Under similar conditions on as in [8], it was shown in [2] 
that (Ej) is equivalent to (Ej) where Xij is constant on each S ^ S. This approach was further 
developed in [Ij where a subset S C G is fixed and afterwards S is defined to be the collection 
of all translates of S (in fact, the authors study the convolution of the squared residuals with 
a discrete kernel). The authors propose a proximal point method for the solution of ([6]). 
This approach of local constraints w.r.t. a window (or kernel) of fixed size was also studied 
in [15] for irregular sampling and regularization functionals other than the total variation 
were considered. In particular, it is observed that the difference between results obtained by 
using the total variation penalty ([3j) and the Dirichlet-energy (integrated squared norm of the 
derivative) is not so big when using local constraints. This is in accordance to findings in [18] 
for one-dimensional signals. In the model of [1] was studied in the continuous function 
space setting. Moreover the authors in [TT] provided a fast algorithm for the solution of 
the constrained optimization problem based on the hierarchical decomposition scheme [34] 
combined with the unconstrained problem ([5j). 

In this paper, we propose a novel, automatic selection rule for the weights cs based on 
a statistically sound method that is applicable for any pre-specified, deterministic system of 
subsets S. We are particularly interested in the case when S constitutes a highly redundant 
collection of subsets of G consisting of overlapping subsets of different scales. This is a 
substantial extension to the approaches in [Tl[15l[TT] that only consider one fixed (pre-defined) 
scale. Our approach will amount to select a single parameter a G [0, 1] with the interpretation 
that the true signal satisfies the constraint in ([6j) with probability a. From the definition 
of ([6]) it is then readily seen that P (J{u) < J{u^)) ^ <^ for any solution of ([6j). In other 
words, our method controls the probability that the reconstruction u is at least as smooth 
(in the sense of J) as the true image u^. To this aim, it will be necessary to gain stochastic 
control on the null-distribution T{e), where e = {^ij} is a lattice of independent A/'(0,cr^)- 
distributed random variabels. 
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Moreover, for the efficient solution of ([6j) we extend the algorithmic ideas in [19] and propose 
a combination of an inexact alternating direction method of multipliers ( ADMM) [T3t [9] with 
Dykstra's projection algorithm [4J. Finally, we indicate how our approach can be applied to 
image deblurring problems in fluorescence microscopy where the observed data does not flt 
into the white noise model ([T]) but where one usually assumes that independently 

Yij - Pois ({Ku)ij) , 1 < i < m, 1 < j < n (8) 

Here, Pois(/3) stands for the Poisson distribution with parameter /3 > 0. We mention that 
similar models occur in positron emission tomography (cf. [35]) and large binocular telescopes 
(cf. [3]) and we claim that our method can be useful there as well. We apply Anscombe's 
transform to transform the Poisson data to normality. Furthermore we present a modifled 
version of the ADMM to solve the resulting variant of (Ej) . We flnally illustrate the capability 
of our approach by numerical examples: Image denoising, deblurring and inpainting and 
deconvolution problems that arise in nanoscale fluorescence microscopy. 

In the following, we denote by IS'I the cardinality of 5 G 5. We often refer to l^l as the 
scale of S. We assume that m,n G N are flxed and denote by (•,•) and ||-|| the Euclidean 
inner-product and norm on R^^^ and by \\u\\^2 the L^-norm of u. For a convex functional 
J : L^{n) ^ W the subdifferential dJ{u) is the set of ah ^ G L^{n) such that J{v) > 
J{u) + Jq£,{v — u) for all v G L^(fi). The Bregman-distance between v^u ^ L^(ri) w.r.t. 
^ G dJ{u) is deflned by 

D^j{v, u) = J{u) - J{v) - [ ^{u-v)> 0. 

If additionally 77 G dJ{v) we deflne the symmetric Bregman distance by DY^{u,v) = 
By J* we denote the Legendre-Fenchel transform of J, i.e. J*(g) = 
^^PiiGL2(Q) Jq^Q ~ J{u). We flnally note that it would not be restricitve (yet less intuitive) 
to replace L^(fi) by any other separable Hilbert space. 

2. Statistical Multiresolution Estimation 

We review sufficient conditions that guarantee existence of SMREs, solutions of (Ej) that is. 
To this end, we rewrite (Ej) into an equality constrained problem and study the corresponding 
augmented Lagrangian function (Section l2.ip . Moreover, we address the important question 
on how to choose the scale weights cs automatically in Section 12.21 Finally, we discuss 
different choices for the system S that have proved feasible in practice in Section 12.31 

2.1. Existence of SMRE. For the time being, let {cs : 5 G 5} be a set of positive real 
numbers. We rewrite (Ej) to an equality constrained problem by introducing the slack variable 
V G W^^'^, To be more precise, we aim for the solution of 

inf J{u)-\-H{v) s.t. Ku-v^Q (9) 
where H denotes the indicator function on the feasible set C of (Ej), i.e. 

C^{v^ R^^^ : T{v -Y)<1] and H{v) ^ ^ ^ _ ^^q) 

+OC else 
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Problems of type (|9]) were studied extensively in [16, Chap. III]. There, Lagrangian multiplier 
methods are employed to solve ([9j) . Recall the definition of the augmented Lagrangian of ([9j) : 

1 9 

Lx{u,v;p) = — \\Ku - v\\ + J{u) + H{v) + {p,Ku-v) , A > 0. (11) 
Za 

Here p E M^^^ denotes the Lagrange multiplier for the linear constraint in (j9j). Note that 
Lx equals the ordinary Lagrangian L(u, v]p) = J{u) + H{v) + {p, Ku — v) augmented by the 
quadratic term \\Ku — v\\^ /(2A) that fosters the fulfillment of the linear constraint in (|9j). 

It is well known that the saddle-points of L and Lx coincide (cf. [161 Chap III Thm. 2.1]) 
and that existence of a saddle point of Lx follows from existence of solutions of ([9]) together 
with constraint qualifications of the MR-statistic T. One typical example for the latter is 
given in Proposition 12. 1[ The result is rather standard and can be deduced e.g. from [12l 
Chap III, Prop 3.1 and Thm. 4.2] (cf. also fT6!, Chap III]) 

Proposition 2.1. Assume that (|9j) has a solution {u^v) G L^(0) xM^^^ and that there exists 
u G L^(fi) such that J{u) < oc and T{Ku — y) < 1 (Slater's constraint qualification). Then, 
there exists p G W^^^ such that {u,v,p) is a saddle point of Lx, i.e. 

Remark 2.2. (1) If m G L'^{n) and v^p ^ ]^mxn Proposition 12. H then u is an 

SMRE, i.e. it solves (|6j). Moreover, the following extremality relations hold: 

—K^'p G dJ{u), p G dH{v) and Ku — v. 

(2) Slater's constraint qualification is for instance satisfied if the set 

[Ku : u G l?{Vt) and J{u) < oc} 

is dense in R^^^. 

(3) If J is chosen to be the total variation semi-norm (|3j), then a sufficient condition for 
the existence of solutions of (|9j) will be that there exists (i, j) G S for some 5 G 5 
such that {Kl)ij ^ 0, where 1 G L^(fi) is the constant 1-function. This is immediate 
from Poincare's inequality for functions in BV(fi) (cf. |37[ Thm. 5. 11.1]). 

2.2. An a priori parameter selection method. The choice of the scale weights cs in 
(|6j) is of utmost importance for they determine the trade-off between smoothing and data-fit 
(and hence play the role of spatially local regularization parameters). We propose a statistical 
method that is based on quant ile values of extremes of transformed distributions. 

We pursue the strategy of controlling the probability that the true signal satisfies the 
constraint in (j6]). To this end, observe that for 5 G 5 the random variable 

{i,j)es 

is x^-distributed with |5| degrees of freedom (d.o.f.). With this notation, it follows from ((T]) 
that satisfies the constraints in © if 

T{Ku° - y) = T{e) = max est sis) < 1. 

Additionally, we require that the maximum above is balanced in the sense that the probability 
for csts{e) > 1 is equal for each 5 G 5. 
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To achieve this, we first aim for transforming ts{£) to normahty. It was shown in [23] that 
the fourth root transform ^/tsi^) is approximately normal with mean and variance 

IIS = V\S\-0.b and al = (syl^' 



respectively. The fourth root transform outperforms other power transforms in the sense that 
the Kullback-Leibler distance to the normal distribution is minimized, see [23]. In particular, 
it was stressed in [23] that the approximation works well for small d.o.f. Next, we consider 
the extreme value statistic 

max ^^-^^ . (12) 

SeS as 

We note that due to the transformation of the random variable ts{£) to normality each scale 
contributes equally to the supremum in (|T2]) . Hence a parameter choice strategy based on 
quantile values of the statistic (|T2]) is likely to balance the different scales occurring in S. We 
make this precise in the following 

Proposition 2.3. For a G (0, 1) and S ^ S let he the a-quantile of the statistic (fT2l) and 
set cs — {qa^S + f^s)~^' Then, any solution u of satisfies 

F{J{u) < J{u^)) > a. (13) 

Proof From ([T]) and from the monotonicity of the mapping x ^fx it follows that 

P {T[Ku^ -y)<^) = P {csts[e) < 1 V5 G 5) 



= P max — ^ Qa \ = 

yses as J 

In other words, the constants cs are chosen such that the true signal satisfies the constraints 
with probability a. By the fact that u is a, solution of (El it follows that F{T{Ku^ - < 
1) < F{J{u) < J{u^)). □ 

Remark 2.4. By the rule cs = {qa^'S ^ l-i^s)~^ in Prop osit ion 12 . 3 1 the problem of selecting the 
set of scale weights cs is reduced to the question on how to choose the single value a G (0, 1). 
The probability a plays the role of an universal regularization parameter and allows for a 
precise statistical interpretation: It constitutes a lower bound on the probability that the 
SMRE u is more regular (in the sense of J) than the true object u^. Moreover, one has that 



cstsie) > 1 ^ ^^^^ > qa, for ah SeS. 

Note, that the constraint in (|6]) can be rewritten into 

^ \{Ku)ij-Y,j\''<-^ for all 5 e 5. 

Since E (151-1 E(„)e5 4) = Var (iSP^ E(„)es 4) = [^P^ the factor 1/ 

{cs \S\) can be considered as a relaxation parameter, that takes into account the uncertainty 
of estimating the variance of the residual on finite scales l^l. Put differently, it is expected 
to be large on small scales and approaches 1 as l^l increases. This is illustrated in Figure [TJ 
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Here the quantity l/{cs \S\) is depicted for the system 5o of ah squares with sidelengths up to 
20 (left panel) and the system ^2 of all dyadic squares (middle panel) in an 341 x 512 image 
for a — 0.2 ('+') and a — 0.9 ('o'). It becomes clear, that only on the smallest scales there 
are non-negligible differences between the scale weights for 5o and ^2. Also our numerical 
experiments confirm, that reconstruction results do not differ very much for different choices 
of a. 




Figure 1. Left and middle: Dependence of relaxation factor {cs \S\)~^ on a 
and scale l^l. Right: Empirical density of the statistic (fT2l) . All values are 
computed for a 341 x 512 image and w.r.t. to 5o (solid) and ^2 (dashed). 

We note that in ^ and ^\ the authors propose relaxation parameters for the case when S 
consists of the translates of a window of fixed size. In [1] the authors fix such a parameter, 1.01 
say, and determine the corresponding window size by heurisitc reasoning. In [TT] the authors 
give for a fixed window size l^l a formula for a relaxation parameter that uses moments of the 
extreme value statistic of independent random variables with l^l degrees of freedom. We 
note that these methods can not be generalized to systems S that contains sets of different 
scales case in a straightforward manner. Our selection rule for the weights cs is designed 
such that different scales are balanced appropriately. Hence our approach is a multi-scale 
extension of the (single-scale) methods in [U [11] . 

Remark 2.5. It is important to note that the random variable ts{E:) and ts'{E:) are indepen- 
dent if and only if 5 H 5' = 0. As we do not assume that S consists of pairwise disjoint sets, 
(|T2]) constitutes an extreme value statistic of dependent random variables. Except for special 
cases, little is known about the distribution of such statistics (see e.g. [28l[29] for asymptotic 
results). It is an open and interesting problem to investigate the asymptotic properties of the 
distribution of the statistic in (|T2ll . 

In practice, the quantile values in Proposition 12.31 are derived from the empirical distri- 
bution of (0^2]) . The right panel in Figure [T] shows the empiricial density of the statistic (IT2]) 
for m — 341 and n — 512 and the systems 5o (solid) and ^2 (dashed) (for our simulations in 
Figure [T] we used 5000 trials) . 

2.3. On the choice of S. In the previous section we addressed the question on how to select 
the scale weights {cs'}^'^^ for a given system of subsets S of the grid G. Altough it is not 
the primary aim of this paper to advocate a particular systems 5, we will now comment on 
possible determinants for a rational choice of S. 
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Figure 2. Left: true signal u^. Right: noiys data Y with a = 0.1 




Figure 3. Solution of ([2]) u{X) with A = 0.075. 



On the one hand, S should be chosen rich enough to resolve local features of the image 
sufficiently well at various scales. On the other hand, it is desirable to keep the cardinality of 
S small such that the optimization problem in ([6]) remains solvable within reasonable time. 
As a consequence of this, a priori information on the signal should be employed in practice 
in order to delimit a suitable system S (e.g. the range of scales to be used). Furthermore we 
note that for guaranteeing that the extreme value statistic (|T2]) does not degenerate (as m, n 
and the cardinality of S increase), S typically has to satisfy certain entropy conditions (see 
e.g. [IS]). We stress that it is a challenging and interesting task to extend these results to 
random (data-driven) systems S. It is well known that such methods can yield good results 
in practice (see e.g. [2j). 

Here, we discuss two different choices of 5, namely: 

(1) the set of all discrete squares in G: for computational reasons usually subsystems are 
considered. We found the subset consisting of all squares with sidelengths up to 20 to 
be efficient. We denote this subset henceforth by Sq. 

(2) the set ^2 of dyadic partitions of G. For a quadratic grid G with m — n — 2^ the 
system ^2 is obtained by recursively splitting the grid into four equal squares until 
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the lowest level of single pixels is reached. To be more precise, 

52 = y |{A;2^...,(fc + l)2^}' : A; = 0, . . . , 2^-^ | . 
For general grids G the left and lower most squares are clipped accordingly. 




Figure 4. Oversmoothed regions identified on the scales \S\ =4,8 and 16 
(from left to right) for the system 5o (left column) and ^2 (right column). 

Obviously, 5o contains much more elements than ^2 and is hence likely to achieve a higher 
resolution. We indicate this by a numerical simulation. Figure [2] depicts the true signal 
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(left, at a resolution of m x n = 341 x 512 pixels and gray values scaled in [0, 1]) and data Y 
according to ^ with K = Id and a = 0.1. 

For illustrative purpose a global estimator u(X) in ([2]) for is computed that exhibits 
both over- and undersmoothed regions (here, we set A = 0.075). This estimator is depicted 
in Figure El The oversmoothed parts in u{X) can be identified via the MR-statistic T in ([7j) 
by marking those sets S' in 5 for which 

The union of these sets for the systems Sq (left column) and 5*2 (right column) are highlighted 
in Figure [4] where we examine the scales \S\ = 4,8, 16. The parameters cs are chosen as in 
Section O with a = 0.9. 

3. Algorithmic Methodology 

In what follows, we present an algorithmic approach to the numerical computation of 
SMRE in practice that extends the methodology in [TH] where we proposed an alternating 
direction method of multipliers (ADMM). Here, we use an inexact version of the ADMM 
which decomposes the original problem into a series of subproblems which are substantially 
easier to solve. In particular, an inversion of the operator K is no longer necessary. For this 
reason the inexact ADMM has attracted much attention recently (see. e.g. [Of [T^ [36]). 

3.1. Inexact ADMM. In order to compute the desired saddle point of the augmented La- 
grangian function Lx in (|TT]) . we use the inexact ADMM that can be considered as a modified 
version of the Uzawa algorithm (see e.g. [HI Chap. III]). Starting with some initial po G W^^^^ 
the original Uzawa algorithm consists in iteratively computing 

(1) {uk,Vk) G argmin^^L2(^7),x;GM-x-^A(^,^;Pfc-i) 

(2) pk = Pk-i + KKuk - Vk)- 

Item (1) amounts to an implicit minimization step w.r.t. to the variabels u and v whereas 
(2) constitutes an explicit maximization step for the Lagrange multiplier p. The algorithm is 
usually stopped once the constraint in ([9j) is fulfilled up to a certain tolerance. 

Applying this algorithm in a straightforward manner is known to be rather impractical 
(mostly due to the difficult minimization problem in the first step) and hence various mod- 
ifications have been proposed in the optimization literature. Firstly, we perform successive 
minimization w.r.t. u and v instead of minimizing simultaneously, i.e. given (uk-i^Vk-i^Pk-i) 
we compute 

(1) Uk G argmin^^L2(f7)^A(^,^/c-i;2^fc-i) 

(2) Vk G argmin^^i^mxn Lx{uk,v;Pk-i) 

(3) Pk = Pk-i + KKuk - Vk)- 

This is the well-known alternating direction method of multipliers (ADMM) as proposed in 
[ini Chap. III]). There, convergence of the algorithm has been studied for the case when J 
satisfies some regularity assumptions. In we extended this result for general functionals J 
(as for example the total variation semi-norm ([3])). The resulting two minimization problems 
usually can be tackled much more efficiently than the original problem. 

Still, the first subproblem above requires the inversion of the (possibly ill-posed) operator 
K. Thus, a second modification adds in the k-th loop of the algorithm the following additional 
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term to Lxiu,Vk-i;Pk-i)- 

^ (Cll^ - Uk-i\\l2 - \\K{u - Uk-i)\\^) . (14) 

Here, (° is chosen such that (° > After some rearrangements of the terms in Lx and 

(IT4|) it can easily be seen that Ku cancels out and thus the undesirable inversion of K is 
is replaced by a single evaluation of K at the previous iterate Uk-i. However, by adding 
(fT4l) the distance to the previous iterate Uk-i is additionally penalized and Lx{u,Vk]Pk-i) is 
minimized only inexactly. 

After the aforementioned rearrangements and by keeping in mind that H is the indicator 
function of the convex set C in (fTOl) . the inexact ADMM can be summarized as follows: 

Algorithm 1 Inexact ADMM 

Require: Y G M^><^ (data), A > (step size). 
uq ^ and vq = po ^ 0. 
for A: = 1, 2, . . . do 

k ^ 

Minimize Lx{-,Vk-i;Pk-i) + ^ (C ll' - ^/c-i|Il2 - \\K{- - ix^^-Of ): 

Uk ^ argmin^ \\u - {uk-i - C^K''{Kuk-i - Vk-i + Aj>/,_i)||^2 + ^J{u). (15) 

Minimize Lx(ukr',Pk-i)- 

Vk ^ proj {Kuk + Xpk-i) . (16) 
C 

Update dual variable: 

Pk ^ Pk-i + X~^{Kuk - Vk)- (17) 

end for 



In practice. Algorithm [T] is very stable and straightforward to implement, provided that 
efficient methods to solve (|T5l) and (|T6l) are at hand (cf. Section [3^21 below) . Moreover, it is 
equivalent to a general first-order primal-dual algorithm as studied in [9] where the following 
convergence result was established. 

Theorem 3.1. [9, Thm. 1] Assume that {u,v,p) is a saddle point of Lx- Moreover, let 
{uk,Vk,Pk)}keN generated by AlgorithmUlvjith ( > \\K\\'^ and define the averaged sequences 

Uk^^'^ui and Pk^^'^Pl- 

1=1 1=1 
Then, each weak cluster point of {uk}j.^f^ is a solution of (|6]) and there exists a constant 
C > such that 

D-/*\uk,u) + D%{Pk,p) < C/k. (18) 

The above result is rather general and in particular situations the assertions may be quite 
weak. In particular if J and i7* have linear growth^ as it is for instance the case for J 
as in ([3]) and H as in (|T0]) , the Bregman distances appearing in (|TH]) may vanish although 
{uk^Pk) 7^ (u^p). If at least one of the functional J or i7* is uniformly convex, it is possible to 
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come up with accelerated versions of Algorithm [T] that allow for stronger convergence results 
(see [9j). For the sake of simplicity we restrict our consideration to the basic algorithm. 

3.2. Subproblems. Closer inspection of Algorithm [T] reveals that the original problem - 
computing a saddle point of Lx - has been replaced by an iterative series of subproblems ([15]) 
and (IT6l) . We will now examine these two subproblems and propose methods that are suited 
to solve them. Here we proceed as in [19j. 

We focus on (|T6l) first. Note that the problem given there amounts to computing the 
orthogonal projection of Vk := Kuk + Xpk-i onto the feasible region C as defined in (|T0]) . 
Due to the supremum taken in the definition ([7]) of the statistic T, we can decompose C into 
^ ClseS where 



(19) 



i.e. each Cs refers to the feasible region that would result if S contained S only. Note that all 
Cs are closed and convex sets (in fact, they are circular cylinders in R^^^; see the left panel 
in Figure [5]). If we fix a Cs and consider some v ^ Cs, the projection of v onto Cs can be 
stated explicitly as 

y,, + {v,j - Y,,)a/^cs Zik,l)es(vki - Yu? if G 5. 

This insight leads us to the conclusion that any method which computes the projection 
onto the intersection of closed and convex sets by merely using the projections onto the 
individual sets only would be feasible to solve fp!6l) . Dykstra's Algorithm [4J works exactly 
in this way and is hence our method of choice to solve (fT6l) . For a detailed statement of 
the algorithm and how the total number of sets that enter it may be decreased to speed up 
runtimes, see [191 Sec. 2.3]. We note that despite these considerations, the predominant 
part of the computation time of Algorithm [T] is spent for the projection step (fT6]) . So far we 
did not take into account parallelization of the projection algorithm. To some extent this 
is possible in a straightforward manner, since the projections onto disjoint sets in S can be 
carried out simultaneously (on GPUs for instance). But also inherently parallel projection 
algorithms (including parallel versions of Dykstra's method) received much attention recently 
and potentially would yield a speed up of Algorithm [H See for instance [6j for an overview. 

We finally turn our attention to (fTSl) . In contrast to the standard version of the ADMM as 
proposed in [19] , the second subproblem in Algorithm [1] does not involve the inversion of the 
operator K. For this reason, (0^51) here simply amounts to solving an unconstrained denoising 
problem with a least-squares data- fit. Numerous methods for a wide range of different choices 
of J are available in order to cope with this problem. If J is chosen as the total variation 
seminorm, for example, the methods introduced in [71 [IHl [26] will be suited (we will use the 
one in [lOj). 

4. Application in Fluorescence Microscopy 

For image acquisition techniques that are based on single photon counts of a light emitting 
sample, such as fluorescence microscopy, the Gaussian error assumption ^ is not realistic. 
Here, the non-additive model ([8]) is to be preferred. Still, the estimation paradigm above can 
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be adapted to this scenario by means of variance stabilizing transformations. To this end we 
first recah [5, Lem. 1] 

Lemma 4.1 (Anscombe's Transform). Let Y ^ Pois(/3) with /3 > 0. Then, for ah c > 



Var (2VrT^) = 1 + ^-j^ + 0(/3 



-2\ 



Thus, the choice c = 3/8 is hkely to stabihze the variance of 2\/y + c at the constant value 
1 (in second order) and its mean at 2^/^ (in first order) or in other words it approximately 
holds that 

2^/Y + 3/8 - 2^ - AA(0, 1). 

It is obvious from Lemma I^T] that the choice c = 1/4 will result in a better reduction of the 
bias, since the mean is then stabihzed at 2^/^ in second order (at the cost of a less stable 
variance). Numerically, we found the difference to be negligible for our purposes. 

With the above considerations, it is straightforward to adapt the estimation scheme ([6j) 
to the present case: Let Xjj = ^^^Yij + 3/8. Then, we define a statistical multiresolution 
estimator u for the model (jSj) to be any solution of 



inf J{u) s.t. supcs* 



{Ku)ij > 0, for ah (i, j) G G. 



2 

< 1, (21) 



Note that for any c > the function t {Vt — c)^ is convex on [0, oo) and thus Problem (|2T]) 
is again a convex optimization problem. Similar as in Section 12.11 we can rewrite (|2T1) into 

inf J{u) + H{v) s.t. Ku-v^{), (22) 

where here H denotes the indicator function on the feasibility set C given by 

C = {^; G W""" : Vij > for ah (i, j) G G and T{2y/^ - X) < l} . (23) 

The right panel in Figure [5] depicts the sets C for the simple case m — 2 and n — 1. It is 
important to note, that in contrast to the Gaussian case (left panel), the feasibility sets C 
are not translation invariant, i.e. their shape and size depend on the data Y . In particular, 
the size increases with ||y|| which is due to the fact that the variance of a Poisson random 
variable with law Pois(;5) increases linearly with the parameter /3. 

In principle. Algorithm [1] can be directly applied to solve (f2T1) : The set C in the projection 
step (fTTll has to be replaced by the modified feasibility set C. Again, we observe that C = 
^S^S^S^ where 

|2 



Cs^lv^W^^^ : cs Yl |2V^-X,,f <1 

Using Dykstra's Algorithm amounts to compute the orthogonal projections onto the sets Cs 
which, in contrast to the Gaussian case, is no longer possible in closed form (except for the 
case when l^l = 1) and approximate solutions have to be used. Since during the runtime 
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Figure 5. Admissible domains for 1 x 2 "images" and data Yi = (1,1) and 
I2 = (7,4): Left: the sets C for the Gaussian case with cr^ = 1 (cf. (|T0|1 ). 



Right: the sets C for the Poisson case (cf. (l23]l ). The scale weights cs are 
chosen as proposed in Proposition 12.31 with a = 0.9. The gray lines delimit 
the sets Cs and for 5 E 5 = {(1, 0), (0, 1), (1, 1)}. 

of Algorithm [1] these projections are to be computed a considerable amount of time, this is 
clearly undesirable since inevitable numerical errors are likely to accumulate. 

As a way out, assume for the time being, that y G R^><^ is some estimator for \J Kv^ with 
yij > 0. Then, by Taylor expansion, we find for u G L^(fi) that 

{Ku)ij 



2 J {Ku)ij = yij + 



^0{\yl-{Ku) 



In place of (1211) . we propose to solve the linearized problem 

inf J{u) s.t. supc^ \Xij -yij - {Ku)ij/yij\^ <l, 



{Ku)ij > 0, for ah (z, j) G G. 
Similar as above, we rewrite ([241) into 

Ku 

— w 



(24) 



0. 



inf J{u) + Hy{w) s.t. 

wGL2(f7),ti;GM^x^ V 

where Hy is the indicator function on the feasible set Cy given by 

C^^[w^ R^><^ : Wij > for ah (ij) G G and T{w + y - X) < l} 

Clearly, the orthogonal projection onto Cy can again be computed efficiently by Dykstra's 
algorithm as outlined in Section [31 The corresponding augmented Lagrangian functional 
reads as 

1 2 

Lx^y{u,w;p) = — \\Ku/y-w\\ + J{u) + Hy{w) + {p, Ku/y - w) . (25) 

For any "good" estimator y for V Ku^ Algorithm [H could be applied immediately to find a 
saddle point {u^ w,p) of i^A,y, but usually such an estimator is not at hand. We hence propose 
to replace in the k-th loop of Algorithm [H the estimator y by \fKuk. To avoid instabilities 
we rather use ^max(Kix/e, 5) for some small positive parameter 5 > 0. We formalize these 
ideas in Algorithm [2l 
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Algorithm 2 ADMM for Poisson Noise 

Require: Y G M^><^ (data), A > (step size), 5 > 0. 
i(;o = ^ and init E W^^"" . 
for A: = 1, 2, ... do 

Minimize Lx^wu-ii'^'^k-i'^Pk-i)' 

1 2 

Uk <- ^■^gmiii-\\Ku/yk-i- {wk-i- \pk-i)\\ +\J{u). (26) 

wGL2(^7) ^ 

Minimize Lx^y^{uk, -^Pk-i)'- 

Wk ^ proj {Kuk/vk + ><Pk-i) • (27) 

Update dual variable: 

Pk ^ Pk-i + ^~^{Kuk/yk - ^/c)- (28) 

end for 



In practice the Algorithm has proved to be very stable, however it seems to be rather 
involved to prove numerical convergence of Algorithm [2] which is beyond the scope of this 
paper. If {u^ w^p) is a limit of the sequence (uk^ Wk^Pk) in Algorithm [21 it is quite obvious that 
{u,w'^) is a solution of (f22]l and hence that u solves (l2T]l . Moreover, it is not straightforward 
to incorporate a preconditioner similar to (IT4|) that renders the step (l26l) semi-implicit. 

5. Numerical Results 

We conclude this paper by demonstrating the performance of SMRE as computed by our 
methodology introduced in the previous Sections. We will treat the denoising problem in 
Paragraph 15. II as well as deconvolution and inpainting problems in Paragraph 15.21 Finally, we 
will study SMRE for the Poisson model ([Hj) computed by means of Algorithm [2] in Paragraph 
15.31 Here we will use real data from nanoscale fluorescence microscopy provided by the 
Department of NanoBiophotonics at the Max Planck Institute for Biophysical Chemistry in 
Gottinge 

When it comes down to computation, we think of an image as an m x n array of pixels 
rather than an element in L^{Q). Accordingly, the operator K is realized as an mn x mn 
matrix and V denotes the discrete (forward) gradient. In all our experiments we use a step 
size A = 0.001 for the ADMM method (Algorithm [1]) and we stop the iteration if the following 
criteria are satisfied 

II^^V^^^-^l' < 10-^ < 10-3 and maxi^ME^^HZIllif^ < i.oi,,. 

||y|| ~ ||y|| ~ SeS as 

Here, S is the system of subsets in use and t^, fis and as are defined as in Section [221 

For the Poisson modification in Algorithm [21 we use the same criteria, instead that Kuk — Vk 
is replaced by \fKuk — Wk and Kuk — 1^ by 2y^Kuk — A, where X is as in Section [H 



: / / www . mpibpc . mpg . de/ groups/hell/ | 
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Figure 6. Denoising results for 10% Gaussian noise. First row: - and 
Bregman oracle. Middle row: SA-TV with window size 11 and 19. Last row: 
SMREs w.r.t. Sq and 5^2 (with a = 0.9). 



5.1. Denoising. In this paragraph we consider data Y given by ([T]) when K is the identity 
matrix and is the test image in Figure [2] (m = 341 and n = 512), i.e. 

Yij = u^ij+Sij, (z, j) G G. 

The study the scenarios when a = 0.1 (10% Gaussian noise) and a = 0.2 (20% Gaussian 
noise). We compute SMRE based on the subsystems of Sq and 5^2 as introduced in Paragraph 
12.31 where we fixed a = 0.9. To this end we utilize Algorithm [T] with C — i-^- ^he standard 
ADMM. 

We compare our estimators to the global estimators u{X) (A > 0) as defined in (Ej). We 
choose A = A2 and A = Ab such that the mean squared distance and the mean symmetric 
Bregman distance to the true signal is minimized, respectively. To be more precise, we set 

A2 = E ( argmin \\u^ - u{X)\f) and Ab = E [ diTgmmDj'^{u^ ,u{X))) , (29) 

V A>0 / V A>0 / 
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Figure 7. Denoising results for 20% Gaussian noise. First row: - and 
Bregman oracle. Middle row: SA-TV with window size 11 and 19. Last row: 
SMREs w.r.t. Sq and 5^2 (with a = 0.9). 



where the symmetric Bregman distance for J as in ([3]) formally reads as 

^ E(iv««l + iv%l)(i-^^). 

^^^^^^^ V \^u,j\\\/v,j\j 

This means that D^^^u^v) is small if for sufficiently many pixels G G either both u 

and V are constant in a neighborhood of (z, j) or the level lines of u and v at (i, j) are locally 
parallel. In practice, we rather use 

J{u)= ^ ^|Vni,-|i + /32 
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Figure 8. Reconstruction details for 10% (left frame) and 20% (right frame) 
Gaussian noise. From top to bottom: True signal, data, L^- and Bregman 
oracle, SA-TV with window size 11 and 19, SMRE w.r.t. So and ^2 and 
a = 0.9 
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instead of J in (|3]) for some small constant /3 ^ 10~^. Then the above formulae are slightly 
more complicated. Since the parameters A2 and Ab are not accessible in practice as is 
unknown, we refer to u{X2) and u{Xb) as L^- and Bregman- oracle, respectively. Simulations 
lead values A2 = 0.026,0.0789 and Ab = 0.0607,0.1767 for a = 0.1,0.2, respectively. 

In addition, we compare our approach to the spatially adaptive TV (SA-TV) method as 
introduced in [llj. The SA-TV algorithm approximates solutions of (|6j) for the case where 
S constitutes the set of all translates of a fixed window S C G (cf. also [Jp) by computing a 
solution of ([5j) with a suitable spatially dependent regularization parameter A. Starting from a 
(constant) initial parameter A = Aq the SA-TV algorithm iteratively adjusts A by increasing it 
in regions that were poorly reconstructed in the previous step. For our numerical comparisons, 
we used the SA-TV- Algorithm considering square windows with side lengths 11 (as suggested 
in [11]) and 19. All parameters involved in the algorithm were chosen as suggested in [H]. In 
particular we set Aq = 0.5 and choose an upper bound for A of L = 1000 in all our simulations. 
As a stopping condition, we used the discrepancy principle which ended the reconstruction 
process after exactly four iteration steps in all of our experiments. 

The reconstructions are displayed in Figure [6] (a = 0.1) and Figure El {cr = 0.2). By visual 
inspection, we find that the oracles are globally under- (L^) and over-regularized (Bregman), 
respectively. While the scalar parameter A was chosen optimally w.r.t. the different distance 
measures, it still cannot cope with the spatially varying smoothness of the true object u^. 

In contrast, SMRE and SA-TV reconstructions exhibit the desired locally adaptive be- 
haviour. Still, the SMRE as formulated in this paper has the advantage that multiple scales 
are taken into account at once, while SA-TV only adapts the parameter on a single given 
scale. As a result, SA-TV reconstructions are of varying quality for finer and coarser features 
of the object, while the SMRE is capable of reconstructing such features equally well. This 
becomes particularly obvious when zooming into the reconstructions (cf. Figure [8j). 

5.2. Deconvolution & Inpainting. We next investigate the performance of our approach 
if the operator K in ([1]) is non-trivial. To be exact, we consider inpainting and deconvolution 
problems. For the first we consider an inpainting domain that occludes 15% of the image with 
noise level a = 0.1 (upper left panel in Figure [9]) and for the latter a Gaussian convolution 
kernel with variance 2 and noise level a = 0.02 (lower left panel in Figure [9j). 

For all experiments we use the dyadic system ^2 and a = 0.9. Note that in both cases we 
have K ^ K"" and \\K\\ = 1; we therefore set ^ = 1-01 in (|T5]1 . The results are depicted in the 
upper right and lower right images of Figure [9l respectively. 

Again, the results indicate that a reasonable trade-off between data fit and smoothing is 
found by the proposed a priori parameter choice rule and that the amount of smoothing is 
adapted according to the image features. 

5.3. Examples from Fluorescence Microscopy. We finally study the performance of our 
approach in a practical application, namely fluorescence microscopy. To be more precise, we 
consider deconvolution problems for standard confocal microscopy and STED (STimulated 
Emission Depletion) microscopy. Both examples have in common, that the recorded data is 
a realization of independent Poisson variables where the intensity at each pixel is determined 
by a blurred version of true signal. In other words. Model (|Hj) applies. In both cases the 
blurring can be modelled (in first order) as a convolution with a Gaussian kernel where the 
width of the kernel for confocal microscopes is 3 — 4 times larger than it is for STED. As 
standard references we refer to [32] (confocal microscopy) and to [25l[2l] (STED). 
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Figure 9. Inpainting (top): data Y with a = 0.1 (left) and SMRE (right). 
Deconvolution (bottom): data Y with a = 0.02 (left) and SMRE (right). 



In both cases, we will use sample images of PtK2 cells taken from the kidney of potorous 
tridactylus^ where beforehand the protein /3-tubulin was tagged with a fluorescent marker. 
What becomes visible is the microtubule part of the cytosceleton of the cells. The left panels 
in Figures [101 and [12] show the confocal and STED recordings, respectively. Both sample 
images show an area of 18 x 18|am^ at a resolution of 798 x 798 pixels. As a regularization 
functional we use in both cases a combination of the total variation semi-norm and the L - 
norm as in ^ with 7=1. 

5.3.1. Confocal microscopy. Figure [TUl depicts a confocal recording of a PtK2 cell (left) and 
the solution of (|2T|) computed by Algorithm [2] (right). We have used the subset of Sq with 
maximal side length of 20 pixel and the scale weights cs are chosen as in Proposition 12. 31 with 
a = 0.9. For the convolution kernel we assume a full width at half maximum of 230nm which 
corresponds to a standard deviation of 4.3422 pixels. Due to this relatively large kernel, the 
impact of the deconvolution is clearly visible. 

This becomes remarkably apparent when one takes a closer look: In Figure [TT] a zoomed-in 
area of the data (left) and the SMRE (right) is depicted. Clearly, the reconstruction reveals 
details that are not visible (or at least not apparent) in the data. In particular, the individual 
microtubule-filaments are separated properly. 

We finally remark, that we proposed a different multi-scale deconvolution method for con- 
focal microscopy in [19]. In contrast to this work, we there used a different MR-statistic than 
T in ([7]) and we used the standardization (Y — 13)/ in order to transform the Poisson data 
Y to normality. The performance of the two approaches for confocal recording is comparable 
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Figure 10. Confocal deconvolution: data Y (left) and SMRE (right). 




Figure 11. Confocal deconvolution (closer look): data Y (left) and SMRE (right). 



since the image intensity (=photon count rate) is relatively high throughout the data and 
hence standardization yields a fair approximation to normality. However, for low-count Pois- 
son data, as we will investigate in the following section, our new approach is clearly preferable, 
mostly due to the fact that Anscombe's transform also works well for small intensities (cf. 
0). 

5.3.2. STED microscopy. The left panel of Figure [l2] depicts a STED recording of a PtK2 
cell. For the convolution kernel a full width at half maximum of 70nm is assumed, which 
corresponds to a standard deviation of approximately 1.1327 pixels. The right image of 
Figure [12] depicts an approximate solution of (|2T|) that was computed by Algorithm [2l Again 
we use the system S of all squares in Sq up to a maximal side length of 20 pixels and the 
thresholds cs are chosen according to Proposition fl2.3|) with a = 0.9. Due to the relatively 
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small convolution kernel, the impact of the deconvolution is less striking as e.g. 
confocal recording in Paragraph 15.3. 1[ 



for the 




Figure 12. STED deconvolution: data Y (left) and SMRE (right). 



When zooming into the image, the effect becomes more obvious: The left and middle panel 
in Figure [13] depicts a detailed view on the STED data and the SMRE from Figure [121 The 
right panel in Figure [13] shows the global reconstruction 'Ug, i.e. we computed a solution of 
(l2Tll w.r.t. to the trivial system S = {G} and the parameter cq as in Proposition 12.31 with 
a = 0.9. The global reconstruction exhibits typical concentration phenomena (especially in 
the upper half of the image) that are due to the ill-posedness of the deconvolution. These 
artefacts are less prominent for the SMRE solution. 

At the same time, some parts of Ug are oversmoothed. In Figure [14] the union of the sets 
S ^ So where 

cs Yl 

are highlighted, where we restrict our consideration to the squares with a sidelength of 7 
pixel. A closer view on the highlighted subset confirms the oversmoothing (lower zoom-box). 
Again we note that at the same time the global image reconstruction has artefacts due to the 
ill-posedness of the deconvolution (upper zoom-box). These can only be avoided by invoking 
stronger regularization. A comparison with the corresponding details of the SMRE (right) 
shows that our locally adaptive approach lacks this undesirable behaviour. 



2\Mj + 3/8-2/^ 



> 1 



6. Conclusion 

In this paper we show how statistical multiresolution estimators, that is solutions of ([6]), 
can be employed for image reconstruction. We stress that our method, combined with a new 
automatic selection rule, locally adapts the amount of regularization according to the multi- 
scale nature of the image features. For the solution of the optimization problem ([6]) we suggest 
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Figure 13. STED deconvolution (detail): data Y (left), SMRE (middle) 
global reconstruction (right) 









i 


3 



Figure 14. Oversmoothed regions in the global estimator Ug identified on the 
scales \S\ = 7. Zoomed-in regions show over- (lower zoom-box) and underreg- 
ularized (upper zoom-box) parts in the image that do not occur in the SMRE 
reconstruction (right boxes). 



an inexact alternating direction method of multipliers combined with Dykstra's projection 
algorithm. We show how this estimation paradigm can be extended to the Poisson model 
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which opens up a vast field of applications, such as Poisson nanoscale fluorescence microscopy. 
Aside to this application, the performance of our method is illustrated for standard problems 
in imaging such as denoising and inpainting. 
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